PROBABILITY ASYMPTOTICS: NOTES ON NOTATION 



SVANTE JANSON 

Abstract. We define and compare several different versions of the O 
and o notations for random variables. The main purpose is to give 
proper definitions in order to avoid ambiguities and mistakes. 



1. Introduction 

There are many situations where one studies asymptotics of random vari- 
ables or events, and it is therefore important to have good definitions and 
notations for random asymptotic properties. Probabihsts use often the stan- 
dard concepts convergence almost surely (-^), convergence in probability 

(— ^) and convergence in distribution (——>■); see any textbook in probabil- 
ity theory for definitions. (Two of my favorite references, at different levels, 
are Gut [2] and Kallenberg [4].) 

Other notations, often used in, for example, discrete probability such as 
probabilistic combinatorics, are probabilistic versions of the O and a nota- 
tion. These notations are very useful; however, several versions exist with 
somewhat different definitions (some equivalent and some not), so some care 
is needed when using them. In particular, I have for many years avoided the 
notations "O(-) w.h.p." and "o(-) w.h.p." on the grounds that these com- 
bine two different asymptotic notations in an ambiguous and potentially 
dangerous way. (In which order do the quantifiers really come in a formal 
definition?) I now have changed opinion, and I regard these as valid and 
useful notations, provided proper definitions are given. One of the purposes 
of these notes is to state such definitions explicitly (according to my interpre- 
tations of the notions; I hope that others interpret them in the same way). 
Moreover, various relations and equivalences between different notions are 
given. 

The results below are all elementary and more or less well-known. I do 
not think that any results are new, and they are in any case at the level of 
exercises in probability theory rather than advanced theorems. Neverthe- 
less, I hope that this collection of various definitions and relations may be 
useful to myself and to others that use these concepts. (See also the similar 
discussion in (3, Section 1.2] of many of these, and some further, notions.) 
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We suppose throughout that are random variables and a„ positive 
numbers, n = 1,2, ... ; unless we say otherwise, the Xn do not have to be 
defined on the same probability space. (In other words, only their distribu- 
tions matter.) All unspecified limits are as n ^ oo. 

All properties below relating Xn and o„ depend only on Xn/an', we could 
thus normalize and assume that a„ = 1, but for convenience in applications, 
we will state the results in the more general form with arbitrary positive ci^. 

2. O AND O 

We begin with the standard definitions for non-random sequences. As- 
sume that hn is some sequence of numbers. 

(Dl) bn = 0{an) if there exist constants C and no such that |6„,| < Ca„ 
for n > uq. Equivalently, 

bn = 0{an) <^=> limsup J— ^ < oo. (1) 

n— >oo 0"n 

(D2) bn = o{an) if bn/cLn — > 0. Equivalently, bn = o(a„) if for every e > 
there exists such that |&„| < ea„ for n>n^. 

Remark 1. When considering sequences as here, the qualifier "n > no" is 
not really necessary in the definition of O(-), and it is often omitted, which 
is equivalent to replacing lim sup by sup in ([T]) . The only effect of using an 
no is to allow us to have a„ or 6„ undefined or infinite, or a„ = 0, for some 
small n; for example, we may write O(logn) without making an explicit 
exception for n = 1. Indeed, if everything is well defined and > 0, as 
we assume in these notes, and \bn\ < Can for n > no, then sup„ \bn/an\ < 
max(C, maxj<„,Q |6„/an|) < oo. 

On the other hand, when considering functions of a continuous variable, 
the two versions of O(-) are different and should be distinguished. (Both ver- 
sions are used in the literature.) For example, there is a difference between 
the conditions f{x) = 0{x) on (0,1) (meaning supo<2,<i |/(x)/x| < oo, 
i.e., a uniform estimate on (0,1)), and f{x) = 0{x) as x ^ (meaning 
limsup^^o < °°) asymptotic estimate for small x); the 

former but not the latter entails that / is bounded also close to 1. (As 
shown here, when necessary, the two versions of O can be distinguished by 
adding qualifiers such as "n oo" or "x — > 0" for the asymptotic version 
and "n > 1" or "x G (0, 1)" for the uniform version. Often, however, such 
qualifiers are omitted when the meaning is clear from the context.) 

3. Convergence in probability 
The standard definition of convergence in probability is as follows. 
(D3) X„ ^ if for every e > 0, P(!X„| > e) ^ 0. Equivalently, 

X„ ^ suplimsupF(|X„| > e) = 0. 

£>0 n^oo 
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Remark 2. More generally, one defines Xn a for a constant a similarly, 
or by Xn — ^ a <^=^ X^ — a — ^ 0. If the random variables Xn are defined 
on the same probability space, one further defines Xn — ^ X for a random 
variable X (defined on that probability space) by Xn X if Xn — X 0. 

It is well-known that convergence in probability to a constant is equivalent 
to convergence in distribution to the same constant. (See e.g. [l|; y; 0] for 
definition and equivalent charaterizations of convergence in distribution.) In 
particular, 

Xn^O ^ X„ A 0. (2) 

4. With high probability 

For events, we are in particular interested in typical events, i.e., events 
that occur with probability tending to 1 as n ^ oo. Thus, we consider an 
event £"„ for each n, and say that: 

(D4) £n holds with high probability (w.h.p.) if ¥{£n) — > 1 as n ^ oo. 
This too is a common and useful notation. 

Remark 3. A common name in probabilistic combinatorics for this property 
has been "almost surely" or "a.s.", but that conflicts with the well estab- 
lished use of this phrase (and abbreviation) in probability theory where it 
means probability equal to 1. In my opinion, "almost surely" (a.s.) should 
be reserved for its probabilistic meaning, since giving it a different mean- 
ing might lead to confusion. (In these notes, a.s. is used in the standard 



sense.) Another alternative name for (D4) is "asymptotically almost surely" 
or "a. a.s.". This name is commonly used, for example in [sl, and the choice 
between the synonymous "w.h.p." (often written whp) and "a. a.s." is a 
matter of taste. (At present, I prefer w.h.p., so I use it here.) 



Definition |(D3)| of convergence in probability can be stated using w.h.p. 



as: 

Xn <^=^ for every e > 0, < e w.h.p. (3) 

5. Op AND Op 

A probabilistic version of O that is frequently used is the following: 

(D5) Xn = Op(a„) if for every e > there exists constants Ce and such 
that P(|X„| < Cettn) > 1 — e for every n > ng. 
In other words, Xnjcin is bounded, up to an exceptional event of arbitrarily 
small (but fixed) positive probability. This is also known as X„/a„ being 
bounded in probability. 

The definition (D5)| can be rewritten in equivalent forms, for example as 
follows. 

Lemma 1. The following are equivalent: 
(i) Xn = Op (an). 
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(ii) For every e > there exists such that P([X„| < Cean) > 1 — e 
for every n. 

(iii) For every e > there exists such that limsup„_j.o^ P(|X„| > 



Cean) < e. 



\X„ 



iv) For every e > there exists such that sup, 
(v) limc^oolinisup„^^P(|X„| > Ca„) = 0. 
vi) limc^ooSup„P(|X„| > Ca„) = 0. 

(ii) follows by increasing in |(D5)| such that 
e for n = 1, . . . , no too. 



> Cean) < e. 



Proof. 
C„a„) > 1 



\Xn 



< 



(ii) 



is trivial. 



(iii) 



v) and (ii) 



(iv) 



reader. 



(vi) are easy and left to the 

□ 



Remark 4. Another term equivalent to "bounded in probability" is tight] 
thus, Xn = Op(an) if and only if the family {X„/a„} is tight. By Prohorov's 
theorem [H; |J, tightness is equivalent to relative compactness of the set of 
distributions. Hence, X^ = Op{an) if and only if every subsequence of 
Xn/on has a subsequence that converges in distribution; however, different 
convergent subsequences may have different limits. In particular, if X„/a„ 
converges in distribution, then X^ = Op(a„). 

The corresponding Op notation can be defined as follows. 
(D6) Xn = Op(an) if for every e > there exists such that 

P(|-^n| < scLn) > 1 — e for every n > n^. 
The definition |(D6)| too has several equivalent forms, for example as fol- 
lows. 



Lemma 2. 

(i) Xn 



The following are equivalent: 



0. 



(ii) For every e > 0, ¥{\Xn\ > ea..^^ 

(iii) sup£>oii™sup„^ocP(|X„| > ea„) = 0. 

(iv) For every e > 0, < ea„ w.h.p. 

(v) Xn/an 0. 

Proof. 



(ii) ^ (iv) 



(ii) follows by standard arguments which we omit, 
is immediate by the definition | (D4) | of w.h.p. 



ii) (v) is immediate by the definition |(D3) | of 



(Further, (iv) 



v) follows by 



□ 



6. Using arbitrary functions u;{n) 



Some papers use properties that are stated using an arbitrary function 
(or sequence) uj{n) — )■ oo. (Or, equivalently, stated in terms of an arbitrary 
sequence 5„ := l/uj{n) — )• 0; see for example 0, Lemma 4.9], which is 
essentially the same as (i) <^=^ (iii) in the following lemma.) They are 
equivalent to Op or Op by the following lemmas. (I find the Op and Op 
notation more transparent and prefer it to using uj{n).) 
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Lemma 3. The following are equivalent: 



(i) Xn = Op{ar, 



(ii) For every function u}{n) — )• oo, < u}{n)an w.h.p. 

(iii) For every function u}{n 

Proof. 



oo, |X„|/(tj(n)a„) 



0. 



ii) For every e > 0, choose as in Lemma Ql^iii) Then 
< limsupP(|X„| > CeOn) < e. 

n— >oo 



uj{n) > for large n, and thus 
hmsupP([X„[ > uj{n)a> 



Hence, hmsup^^oo P(|X„| > a;(n)an) = 0, which is (i 

If Xn = Op{an) does not hold, then, by the definition |(D5)[ 



(ii) 




(i) 





there exists e > such that for every C there exist arbitrarily large n with 
P(|X„| > Con) > £■ We may thus inductively define an increasing sequence 
rifc, = 1,2, . . . , such that P(|Xnj.| > kon,.) > £■ Define oj{n) by uj{nk) = k 
and u}{n) = n for n ^ {nk}- Then uj{n) oo and P(|X„| > uj{n)an) 7^ 0, 
so (ii) does not hold. 

(ii) (iii) If e > 0, then eoj{n) — )• 00 too, and thus by (ii) < 

eijj{n)an w.h.p. Thus X„/(a;(n)a„) — ^ by ([3]). 
^ Take g = 1 in Q. 



ni, 



□ 



Remark 5. Lemma [3] generalizes the corresponding result for a non-random 
sequence {6^} is bounded <^=^ < a;(n) for every uj{n) — )• 00 ■^=r- 

\bn\/^^{n) for every u){n) 00. 

Lemma 4. The following are equivalent: 

(i) Xn = Op(a„). 

(ii) For some function uj{n) — > 00, < an/io{n) w.h.p. 



(iii) For some function uj[n 
Proof. 



00, uj{n)\Xn\/ar. 



0. 



ii) By the definition |(D6) for every k there exists such 



that if n > n^, then P(|X„| > k 



< k . We may further assume 



that rik > nk-i, with uq = 1. Define u}{n) = k for Uk < n < n^+i. Then 
u{n) 00 and P(|X„| > uj{n)~^an) < uj{n)^^. Since uj{n)^^ — > 0, this 
yields (ii) 

(ii) (i) Let e > 0. Since uj{n)~^ < e for large n, (ii) implies that 

\Xn\ < £CLn w.h.p. Thus Xn = Op(a„) by Lemma[2l 

If uj(n) is as in (ii) , then Cij(n)^/^|X„|/a„ < uj{n)^^/'^ w.h.p.; 

p 



;ii) 



in 



,-1/2 



smce uj[n 
function uj{n) 



1/2 



0, this implies a;(n)^/^X„/a„ 



00. 



(iii) 




(ii) 





Take e 



1 in p. 

7. Olp and o^p 



0, so (iii) holds with the 

□ 



The following notations are less common but sometimes very useful. Re- 
call that for < p < 00 the norm of a random variable X is II-'^Ulp : = 
(E IXI^)^*^^. Let p > be a fixed number. (In applications, usually p = 1 or 
p = 2.) 
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(D7) Xn = OLp{an) if ll-'^nllLP = 0(a„). 
(D8) Xn = OLp{an) if ll-^nlliP = o{an)- 

In other words, X^ = OLp{an) <^=^ E \X„\p = O(a^) andX„ = oip{an) 
E|X„|*' = o(a^); in particular, Xn = Oii{an) E[X„| = 0(a„) and 

= Oii{an) <^=^ E \Xn\ = o{an). 

Xn = oi^i{an) thus says that E|X„/a„| — )• 0, which often is expressed 
as Xn/dn in mean. More generally, Xn = OLp{an) is the same as 
M\Xn/an\^ 0, which is called Xn/dn in p-mean (or in L^). (For 
p = 2, a common name is Xn/on — >■ m square mean.) 

We may also take p = oo. Since L°° is the space of bounded random vari- 
ables and |[X||/,oo is the essential supremum of \X\, i.e., ||X||ioo := inf{C : 



X\ < C a.s.}, the definitions (D7) - (D8) can for p = oo be rewritten as: 



(D9) Xn = Oioo^an) if there exists a constant C such that < Ca„ 
a.s. 

(DIO) Xn = OLoo[an) if there exists a sequence (5„ — )• such that < 

dnOjn a.s. 

Remark 6. In applications in discrete probability, typically each Xn is a 
discrete random variable taking only a finite number of possible values, each 
with positive probability. In such cases (and more generally if the number 
of values is countable, each with positive probability), < Ca„ a.s. 

\Xn\ < Can surely (i.e., for each realization), and < 5 

\Xn\ <5nan surely. 

The notions Olp and olv are useful for example when considering sums of 
a growing (or infinite) number of terms, since (for p > 1) such estimates can 
be added by Minkowski's inequality. For example, if Xn = Y17=i^ni, and 
Yni = OLp{an) (uniformly in i) for some p > I, then Xn = OLp{nan), and 
similarly for olp- Note that the corresponding statement for Op and Op are 
false. (Example: Let Yni be independent with P(l^i = n^) = 1 — P(y„j = 
0) = 1/n and let a„ = 1.) 

By Lyapunov's (or Holder's) inequality, Xn = Olp (an) =^ Xn = 
OLi{an) and Xn = OLp{an) Xn = OLq{an) when < g < p < oo. Thus 
the estimates become stronger as p increases. They are, for all p, stronger 
than Op and Op. 

Lemma 5. Let < p < oo. Then Xn = Oivian) =^ Xn = Op{an) and 

Xn = OLp{an) =^ Xn = Op{an). 

Proof. Immediate from Markov's inequality. □ 

The converse fails for every p > 0. (Example for any p > 0: Take Xn 
with F{Xn = e") = 1 - P(X„ = 0) = 1/n and let a„ = 1.) 

Remark 7. For p < oo, Xn = OLp{an) is equivalent to Xn = Op(a„) together 
with the condition that {|X„/an|^} are uniformly integrable, see e.g. [2] or 
i- 
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Another advantage of Olp and olp is that they are strong enough to imply 
moment estimates: 

Lemma 6. If k is a positive integer with k < p, then Xn = OLp{an) =^ 
EX!^ = 0{a^) and X„ = OLp(a„) ^ EX^ = o(a^). □ 

In particular, X„ = 0^i(a„) =^ EX„ = 0{an) and Xn = 0^,1 (a„) =^ 
KXn = o(a„); further, Xn = 0^2 (a„) =^ VarX„ = O(a^) and Xn = 
Oi2(a„) ^> Mar Xn = o{al). 

8. O W.H.P. AND o W.H.P. 

Since the basic meaning of O is "bounded by some fixed but unknown 
constant", my interpretation of "0(an) w.h.p." is the following: 
(Dll) Xn = 0{an) w.h.p. if there exists a constant C such that < Ca„ 
w.h.p. 



Comparing Definitions (D5) and (Dll) , we see that the latter is a stronger 
notion: 

Xn = 0(a„) w.h.p. ^ Xn = Op(a„), (4) 
but the converse does not hold. (In fact, (Dll) is the same as |(D5) with 



the restriction that must be chosen independent of e.) For example, if 

Xn/dn Y for some random variable Y, then always Xn = Op(a„), see 
Remark m but it is easily seen that Xn = 0{an) w.h.p. if and only if Y is 
bounded, i.e., |y| < C (a.s.) for some constant C < 00. (In particular, if 
Xn = X does not depend on n, then always X„ = Op(l), but X„ = 0(1) 
w.h.p. only if X is bounded.) This also shows that Xn = Olp (an) in general 
does not imply Xn = 0(a„) w.h.p. 

Remark 8. More generally, Xn = 0{an) w.h.p. if and only if every sub- 
sequence of Xn/ttn has a subsequence that converges in distribution to a 
bounded random variable, with some uniform bound for all subsequence 
limits. 

Remark 9. The property Xn = 0{an) w.h.p. was denoted X„ = Oc{an) 
in 0]. (A notation that perhaps was not very successful.) 

Similarly, the basic meaning of o is "bounded by some fixed but unknown 
sequence (5„ 0"; thus my interpretation of "o(a„) w.h.p." is the following: 

(D12) Xn = o{an) w.h.p. if there exists a sequence (5„ — >■ such that < 
6nan w.h.p. 

This condition is the same as Lemma [ ^ii)| (with 5„ = w(n)~^), and thus 
Lemma m implies the following equivalence: 

Lemma 7. Xn = o(a„) w.h.p. Xn = Op(a„). □ 



It is obvious from the definitions (Dll) and (D12) that o(a„) w.h.p. im- 



plies 0{an) w.h.p., and we thus have the chain of implications (where the 
last two are not reversible): 

Op{an) o{an) w.h.p. =^ 0{an) w.h.p. ^> Op(an). (5) 
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Warning. I do not think that definition (Dll) is the only interpretation of 
"0(a „) w.h.p." that is used, so extreme care is needed when using or seeing 
this notation to avoid confusion and mistakes. (For example, I've heard 
the interpretation that "0(a„) w.h.p." should be equivalent to "Op(ari)"-) 
The risks with "op w.h.p." seem smaller; at least, I do not know any other 
reasonable (non-equivalent) interpretation of it. 

9. O AND o A.s. 

In this section we assume that the random variables Xn are defined to- 
gether on the same probability space Q,. In other words, the variables X^ 
are coupled. (In combinatorial situations this is usually not the case, since 
typically each Xn is defined separately on some model of "size" n; how- 
ever, it happens, for example in a model that grows in size by some random 
process.) This assumption makes it possible to talk about convergence and 
other properties a.s., i.e., pointwise (= pathwise) for all points in the prob- 
ability space Q, except for a subset with probability 0. This means that 
we consider the sequence Xn{oj) of real numbers separately for each point 
uj in the probability space. Hence, we apply definitions |(D1)| and |(D2)| for 
non-random sequences and obtain the following definitions. 

(D13) Xn = 0{an) a.s. if for almost every uj ^ Vt, there exists a number 
C{uj) such that |X„(a;)| < C{uj)an- In other words, Xn = 0{an) 
a.s. if there exists a random variable C such that \Xn\ < Ca„ a.s. 
Equivalently, 

I I 

Xn = 0{an) a.s. <J=^ limsup — — < oo a.s. (6) 

n— >oo On 

(D14) Xn = o{an) a.s. if for almost every a; G il, [X„(a;)[/a„ 0. In other 
words, Xn = o{an) a.s. if Xn/an 0. 
It is well-known that convergence almost surely implies convergence in 



probability (but not conversely). Consequently, by (D14) and Lemmas [2] 
andH 

Xn = o(a„) a.s. ^> Xn = Op(an) Xn = o(a„) w.h.p. (7) 

The situation for O is more complicated. We first observe the implication 
Xn = 0(a„) a.s. ^> Xn = Op(a„). (8) 
(The converse does not hold, see Example below.) Indeed, if c is any 



constant, and C is a random variable with < Ca„ as in (D13) , then 
P(|Xn| > can) < IP(C > c), and thus Lemma [^yi)] holds because P(C > 
c) ^ as c ^ oo. Hence, Lemma [U yields ([8]). 

However, the following two examples show that neither of X„ = 0(1) a.s. 
and Xn = 0{1) w.h.p. implies the other. 

Example 8. Let Xn = X be independent of n and let a„ = 1. Then 



Xn = 0(1) a-s. for every random variable X (take C = X in (D13)), but 
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Xn = 0{1) w.h.p. only if X is a bounded random variable (i.e., \X\ < c a.s. 
for some constant c). 

Example 9. Let Xn be independent random variables with F(X„ = n) = 
1/n and P(X„ = 0) = 1 — 1/n, and take a„ = 1. By the Borel-Cantelli 
lemma, X„ = n infinitely often a.s., and thus lim sup„^o<3 X„ = oo a.s.; 
consequently Xn is not 0(1) a.s. On the other hand, Xn — ^ 0, so Xn = 
Op(l) and Xn = 0(1) w.h.p. by 

Warning. In particular, there is no analogue of ([7]) for O a.s. and O w.h.p. 
Since "a.s." usually is a strong notion compared to others (for example for 
convergence), there is an obvious risk of confusion and mistakes here, and it 
is important to be extra careful when using "0(an) a.s." and "0(an) w.h.p.". 



10. A FINAL WARNING 



Sometimes one sees expressions of the type Xn = 0{an) or Xn = o{an), for 
some random variables Xn, without further qualifications or explanations. 
In analogy with Section [8l I think that the natural interpretations of these 
are the following: 

(D15) Xn = 0{an) if there exists a constant C such that \Xn\ < Can 
(surely, or a.s.). 

(D16) Xn = o(a„) if there exists a sequence 5„ ^ such that \Xn\ < SnO-n 
(surely, or a.s.). 

These notations are thus uniform estimates, and stronger than Xn = 0{an) 
w.h.p. and X„ = o(a„) w.h.p., since no exceptional events of small proba- 
bilities are allowed. 

Remark 10. As remarked in Remark[6l in typical applications "surely" and 
"a.s." are equivalent. When they are not, it is presumably best to follow 
standard probability theory practise and ignore events of probability 0, so 
the interpretation "a.s " in |(D15) - (D16) seems best. In this case, (D15)- 
(D16) are the same as (D9)-(D10) so Xn = 0{an) Xn = OLoo[an) 



and X„ 



o(a„) 



Xn 



OL°°{an)- 



Warning. However, I guess that most times one of these notations is used, 
(D15) or (D16) is not the intended meaning; either there are typos, or the 



author really means something else, presumably one of the other notions 
discussed above. 



Remark 11. In the special situation that all Xn are defined on a common 
probability space as in Section [9l another reasonable interpretation of Xn = 
0{an) and Xn = o{an) is Xn = 0{an) a.s. and Xn = o{an) a.s., see (D13) 



(D14) This is equivalent to allowing random C or 5n in (D15) -(D16) and 
is a weaker property. (This emphasizes the need for careful definitions to 
avoid ambiguities.) 



10 



SVANTE JANSON 



The notations (D15) and (D16) thus risk being ambiguous. If (D15) 



or 



(D16) really is intended, it may be better to use the unambiguous notation 



Ol°° or ol°°, see Section [7] and Remark [TOl 
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